我们研究了设计AI代理商的问题,该代理可以学习有效地与潜在的次优伴侣有效合作,同时无法访问联合奖励功能。这个问题被建模为合作焦论双代理马尔可夫决策过程。我们假设仅在游戏的Stackelberg制定中的两个代理中的第一个控制,其中第二代理正在作用,以便在鉴于第一代理的政策给出预期的效用。第一个代理人应该如何尽快学习联合奖励功能,因此联合政策尽可能接近最佳?在本文中,我们分析了如何在这一交互式的两个代理方案中获得对奖励函数的知识。我们展示当学习代理的策略对转换函数有显着影响时,可以有效地学习奖励功能。
translated by 谷歌翻译
通常,根据某些固有的价值衡量标准,绩效是定义的。相反,我们考虑一个个人的价值为\ emph {相对}的设置:当决策者(DM)选择一组从人口中的个人来最大化预期效用时,自然考虑\ emph {预期的边际贡献}(每个人的emc)。我们表明,这个概念满足了这种环境公平性的公理定义。我们还表明,对于某些政策结构,这种公平概念与最大化的预期效用保持一致,而对于线性实用程序功能,它与Shapley值相同。但是,对于某些自然政策,例如选择具有一组特定属性的个人的政策(例如,大学入学的足够高考试成绩),精英级和公用事业最大化之间存在权衡。我们根据挪威大学的大学录取和成果,分析了限制对政策对效用和公平性的影响。
translated by 谷歌翻译
在本文中,我们考虑了增强学习(RL)中对风险敏感的顺序决策。我们的贡献是两个方面。首先,我们介绍了一种新颖而连贯的风险量化,即复合风险,该风险量化了学习过程中综合和认知风险的关节作用。现有的作品单独被视为综合性或认知风险,或作为添加剂组合。我们证明,当认知风险措施被期望取代时,添加剂配方是复合风险的特殊情况。因此,综合风险比单个和添加剂配方对伴侣和认知不确定性更敏感。我们还基于集合引导和分布RL提出了一种算法,Sentinel-K,分别代表认知和差异不确定性。 K Learners的合奏使用遵循正规领导者(FTRL)来汇总分布并获得综合风险。我们通过实验验证了Sentinel-K可以更好地估计回报分布,并且与复合风险估计相比,与最新风险敏感和分布RL算法相比,对风险敏感的性能更高。
translated by 谷歌翻译
Given a large graph with few node labels, how can we (a) identify the mixed network-effect of the graph and (b) predict the unknown labels accurately and efficiently? This work proposes Network Effect Analysis (NEA) and UltraProp, which are based on two insights: (a) the network-effect (NE) insight: a graph can exhibit not only one of homophily and heterophily, but also both or none in a label-wise manner, and (b) the neighbor-differentiation (ND) insight: neighbors have different degrees of influence on the target node based on the strength of connections. NEA provides a statistical test to check whether a graph exhibits network-effect or not, and surprisingly discovers the absence of NE in many real-world graphs known to have heterophily. UltraProp solves the node classification problem with notable advantages: (a) Accurate, thanks to the network-effect (NE) and neighbor-differentiation (ND) insights; (b) Explainable, precisely estimating the compatibility matrix; (c) Scalable, being linear with the input size and handling graphs with millions of nodes; and (d) Principled, with closed-form formula and theoretical guarantee. Applied on eight real-world graph datasets, UltraProp outperforms top competitors in terms of accuracy and run time, requiring only stock CPU servers. On a large real-world graph with 1.6M nodes and 22.3M edges, UltraProp achieves more than 9 times speedup (12 minutes vs. 2 hours) compared to most competitors.
translated by 谷歌翻译
High content imaging assays can capture rich phenotypic response data for large sets of compound treatments, aiding in the characterization and discovery of novel drugs. However, extracting representative features from high content images that can capture subtle nuances in phenotypes remains challenging. The lack of high-quality labels makes it difficult to achieve satisfactory results with supervised deep learning. Self-Supervised learning methods, which learn from automatically generated labels has shown great success on natural images, offer an attractive alternative also to microscopy images. However, we find that self-supervised learning techniques underperform on high content imaging assays. One challenge is the undesirable domain shifts present in the data known as batch effects, which may be caused by biological noise or uncontrolled experimental conditions. To this end, we introduce Cross-Domain Consistency Learning (CDCL), a novel approach that is able to learn in the presence of batch effects. CDCL enforces the learning of biological similarities while disregarding undesirable batch-specific signals, which leads to more useful and versatile representations. These features are organised according to their morphological changes and are more useful for downstream tasks - such as distinguishing treatments and mode of action.
translated by 谷歌翻译
While risk-neutral reinforcement learning has shown experimental success in a number of applications, it is well-known to be non-robust with respect to noise and perturbations in the parameters of the system. For this reason, risk-sensitive reinforcement learning algorithms have been studied to introduce robustness and sample efficiency, and lead to better real-life performance. In this work, we introduce new model-free risk-sensitive reinforcement learning algorithms as variations of widely-used Policy Gradient algorithms with similar implementation properties. In particular, we study the effect of exponential criteria on the risk-sensitivity of the policy of a reinforcement learning agent, and develop variants of the Monte Carlo Policy Gradient algorithm and the online (temporal-difference) Actor-Critic algorithm. Analytical results showcase that the use of exponential criteria generalize commonly used ad-hoc regularization approaches. The implementation, performance, and robustness properties of the proposed methods are evaluated in simulated experiments.
translated by 谷歌翻译
Hierarchical learning algorithms that gradually approximate a solution to a data-driven optimization problem are essential to decision-making systems, especially under limitations on time and computational resources. In this study, we introduce a general-purpose hierarchical learning architecture that is based on the progressive partitioning of a possibly multi-resolution data space. The optimal partition is gradually approximated by solving a sequence of optimization sub-problems that yield a sequence of partitions with increasing number of subsets. We show that the solution of each optimization problem can be estimated online using gradient-free stochastic approximation updates. As a consequence, a function approximation problem can be defined within each subset of the partition and solved using the theory of two-timescale stochastic approximation algorithms. This simulates an annealing process and defines a robust and interpretable heuristic method to gradually increase the complexity of the learning architecture in a task-agnostic manner, giving emphasis to regions of the data space that are considered more important according to a predefined criterion. Finally, by imposing a tree structure in the progression of the partitions, we provide a means to incorporate potential multi-resolution structure of the data space into this approach, significantly reducing its complexity, while introducing hierarchical feature extraction properties similar to certain classes of deep learning architectures. Asymptotic convergence analysis and experimental results are provided for clustering, classification, and regression problems.
translated by 谷歌翻译
Being able to forecast the popularity of new garment designs is very important in an industry as fast paced as fashion, both in terms of profitability and reducing the problem of unsold inventory. Here, we attempt to address this task in order to provide informative forecasts to fashion designers within a virtual reality designer application that will allow them to fine tune their creations based on current consumer preferences within an interactive and immersive environment. To achieve this we have to deal with the following central challenges: (1) the proposed method should not hinder the creative process and thus it has to rely only on the garment's visual characteristics, (2) the new garment lacks historical data from which to extrapolate their future popularity and (3) fashion trends in general are highly dynamical. To this end, we develop a computer vision pipeline fine tuned on fashion imagery in order to extract relevant visual features along with the category and attributes of the garment. We propose a hierarchical label sharing (HLS) pipeline for automatically capturing hierarchical relations among fashion categories and attributes. Moreover, we propose MuQAR, a Multimodal Quasi-AutoRegressive neural network that forecasts the popularity of new garments by combining their visual features and categorical features while an autoregressive neural network is modelling the popularity time series of the garment's category and attributes. Both the proposed HLS and MuQAR prove capable of surpassing the current state-of-the-art in key benchmark datasets, DeepFashion for image classification and VISUELLE for new garment sales forecasting.
translated by 谷歌翻译
As aerial robots are tasked to navigate environments of increased complexity, embedding collision tolerance in their design becomes important. In this survey we review the current state-of-the-art within the niche field of collision-tolerant micro aerial vehicles and present different design approaches identified in the literature, as well as methods that have focused on autonomy functionalities that exploit collision resilience. Subsequently, we discuss the relevance to biological systems and provide our view on key directions of future fruitful research.
translated by 谷歌翻译
The Forster transform is a method of regularizing a dataset by placing it in {\em radial isotropic position} while maintaining some of its essential properties. Forster transforms have played a key role in a diverse range of settings spanning computer science and functional analysis. Prior work had given {\em weakly} polynomial time algorithms for computing Forster transforms, when they exist. Our main result is the first {\em strongly polynomial time} algorithm to compute an approximate Forster transform of a given dataset or certify that no such transformation exists. By leveraging our strongly polynomial Forster algorithm, we obtain the first strongly polynomial time algorithm for {\em distribution-free} PAC learning of halfspaces. This learning result is surprising because {\em proper} PAC learning of halfspaces is {\em equivalent} to linear programming. Our learning approach extends to give a strongly polynomial halfspace learner in the presence of random classification noise and, more generally, Massart noise.
translated by 谷歌翻译